Training Agents to Perform Sequential Behavior
نویسندگان
چکیده
This paper is concerned with training an agent to perform sequential behavior. In previous work we have been applying reinforcement learning techniques to control a reactive robot. Obviously, a pure reactive system is limited in the kind of interactions it can learn. In particular, it can only learn what we call pseudo-sequences, that is sequences of actions in which the transition signal is generated by the appearance of a sensorial stimulus. We discuss the difference between pseudo-sequences and proper sequences, and the implication that these differences have on training procedures. A result of our research is that, in case of proper sequences, for learning to be successful the agent must have some kind of memory; moreover it is often necessary to let the trainer and the learner communicate. We study therefore the influence of communication on the learning process. First we consider trainer-to-learner communication introducing the concept of reinforcement sensor, which let the learning robot explicitly know whether the last reinforcement was a reward or a punishment; we also show how the use of this sensor induces the creation of a set of error recovery rules. Then we introduce learner-to-trainer communication, which is used to disambiguate indeterminate training situations, that is situations in which observation alone of the learner behavior does not provide the trainer with enough information to decide if the learner is performing a right or a wrong move. All the design choices we make are discussed and compared by means of experiments in a simulated world. _______________________________ * This work has been partly supported by the Italian National Research Council, under the "Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo", subproject 2 "Processori dedicati", and under the "Progetto Finalizzato Robotica", subproject 2 "Tema: ALPI". + Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133 Milano, Italy (e-mail: [email protected]). # International Computer Science Institute, Berkeley, CA 94704, and Progetto di Intelligenza Artificiale e Robotica, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32, 20133 Milano, Italy (e-mail: [email protected]).
منابع مشابه
Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework
As computational learning agents continue to improve their ability to learn sequential decision-making tasks, a central but largely unfulfilled goal is to deploy these agents in real-world domains in which they interact with humans and make decisions that affect our lives. People will want such interactive agents to be able to perform tasks for which the agent’s original developers could not pr...
متن کاملMultiagent Supervised Training with Agent Hierarchies and Manual Behavior Decomposition
We present a supervised learning from demonstration system capable of training stateful and recurrent behaviors, both in the single agent and multiagent case. Furthermore, behavior complexity due to statefulness and multiple agents can result in a high dimensional learning space, which can require many samples to learn properly. Our approach, which relies heavily on both per-agent behavior deco...
متن کاملAre People Successful at Learning Sequential Decisions on a Perceptual Matching Task?
Sequential decision-making tasks are commonplace in our everyday lives. We report the results of an experiment in which human subjects were trained to perform a perceptual matching task, an instance of a sequential decision-making task. We use two benchmarks to evaluate the quality of subjects’ learning. One benchmark is based on optimal performance as defined by a dynamic programming procedure...
متن کاملInteractive Learning for Sequential Decisions and Predictions
Sequential prediction problems arise commonly in many areas of robotics and information processing: e.g., predicting a sequence of actions over time to achieve a goal in a control task, interpreting an image through a sequence of local image patch classifications, or translating speech to text through an iterative decoding procedure. Learning predictors that can reliably perform such sequential...
متن کاملA model to predict the sequential behavior of healthy blood donors using data mining
This article has no abstract.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Adaptive Behaviour
دوره 2 شماره
صفحات -
تاریخ انتشار 1994